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Abstract 

We consider a method for approximate inference in hidden Markov models (HMMs). The method circum- 
vents the need to evaluate conditional densities of observations given the hidden states. It may be considered an 
instance of Approximate Bayesian Computation (ABC) and it involves the introduction of auxiliary variables 
valued in the same space as the observations. The quality of the approximation may be controlled to arbitrary 
precision through a parameter e > 0. We provide theoretical results which quantify, in terms of e, the ABC 
error in approximation of expectations of additive functionals with respect to the smoothing distributions. Un- 
der regularity assumptions, this error is 0{ne), where n is the number of time steps over which smoothing is 
performed. For numerical implementation we adopt the forward-only sequential Monte Carlo (SMC) scheme of 
|16| and quantify the combined error from the ABC and SMC approximations. This forms some of the first 
quantitative results for ABC methods which jointly treat the ABC and simulation errors, with a finite number 
of data and simulated samples. When the HMM has unknown static parameters, we consider particle Markov 
chain Monte Carlo [2] (PMCMC) methods for batch statistical inference. 

Key-words: Smoothing, Hidden Markov Models, Approximate Bayesian Computation, Sequential Monte 
Carlo, Markov chain Monte Carlo. 



1 Introduction 

Hidden Markov Models are widely used in statistics; see [9] for a recent overview. An HMM is a pair of discrete-time 
stochastic processes, {X n } n>0 and {Y„} n>0 , where X n £ M. d:c is unobserved and Y n £ M. d y is observed. The hidden 
process {X n } n>Q is a Markov chain with initial density 770 at time and transition density / (x n —i, x n ), i.e. 

P(X £ A) = I r] Q (x)dx and F(X n £ A\X n -i = ar„_i) = / x n )dx n n>l (1) 

J A J A 

where A C ]R dx and dx n is a dominating a— finite measure. Each observation Y n is conditionally independent of 
other variables given {X n } n>Q and its conditional distribution is specified by a density g (x n ,y n ), i.e. 

P(F„ £ B\{X k } k > Q = {x k } k > ) = / g{x n ,y n )dy n n>0 (2) 

J B 

with B C K d » and dy n is a dominating a— finite measure. We remark that 770, f(x n -i,x n ) and g(x n ,y n ) may 
depend upon time-independent parameters, which we term static parameters. 

A variety of inference and estimation tasks for HMMs involve the computation of the smoothing functional 

V„ : R(™ +1 ) d * K 

E[V n (X .. n )\y 0:n ] (3) 

where V„(xo :rl ) = J2p=o v p( x p-i--p)' v p '■ K 2da; R, 1 < p < n, x_ 1:0 = x with v : M. d * R and the expectation is 
w.r.t. the joint smoothing distribution. For example in the cases v p (x p ) = x p / (n+ 1) and v p (x p ^i- p ) — x p _ix p / (n + 
1), ([3| approximates the posterior mean and first-order auto-covariance. When the HMM includes unknown static 
parameters, expectations of additive functionals play a central role in expectation-maximisation (EM) algorithms 
and the calculation of score vectors; see [16] for some discussion. 



In practice, the expectation in ^ can rarely be computed exactly and one resorts to the use of numerical 
integration techniques such as SMC; see [TH] for a recent overview. These methods typically rely on the ability 
to evaluate pointwise the conditional density g(x,y). The methods we consider address the problem of obtaining 
an approximation of ^ without performing any such evaluations, in a principled manner admitting control of the 
error in approximation. The motivations for avoiding such evaluations are as follows. Firstly, for some models, 
g(x,y) may simply not have a closed form expression. Secondly, in some situations evaluation of g(x,y) may be 
very expensive. 

The general technique we consider may be interpreted as an instance of ABC. A recent review of this class of 
methods can be found in [21]. In the context of HMMs, ABC has been considered by [12], [22] , see also [IT]. The 



approximation (given in Section 2.1| has been introduced by [25] and still requires numerical (e.g. SMC) methods 
to fit them. Alternative ideas include nonparametric filtering [5D] and the related convolution particle filter [5]; see 
[2"2] for some discussion relative to ABC. As noted by 24 , in the scenario where there is a fixed amount of data 
and a fixed number of simulated samples, there is a distinct lack of theoretical results which quantify the combined 
ABC and numerical (SMC) errors; we provide some of the first results in this context (see [7] [22] for other results). 
Our main objectives are to 

1. investigate, both theoretically and empirically, the error associated with the approximation scheme we propose 

2. demonstrate how this scheme can be used to perform smoothing and to estimate static parameters in HMMs 
from a batch of data 

Regarding point 1., the error has two components. The first component arises from the introduction of an 
auxiliary HMM, incorporating auxiliary variables valued in the same space as the observations {y n } n >o- Smoothing 
expectations under this auxiliary model, which we write as E e [V n (Xo :n )|yo:n], may be taken as approximations of 
Q; the degree of approximation is controlled through a parameter e and the error should disappear as e — > 0. 
In turn, expectations under this auxiliary model admit efficient numerical approximation using SMC techniques, 
without evaluation of g(x, y). The second component of the overall error arises from this Monte Carlo scheme, and is 
controlled through a sample size parameter TV; the error disappears as N — > +oo. It is noted that the SMC method 
adopted is the forward only smoothing implementation of the forward-filtering backward smoothing (FFBS) method 
[3] [ST] in [TS] ; this is currently one of the most accurate methods for SMC approximation of smoothed additive 
functionals. 

We will write the SMC estimate of the smoothing expectation under the auxiliary HMM as S„[e, N, V n , yo-.n] 
and, similarly, we may denote the SMC estimate of the smoothing expectation under the orig inal HMM 0-@ 
as S„[iV, V n , yo-.n]- In the numerical studies in Section [5j the errors associated with these SMC estimates will be 
denoted as e^' e and e^, respectively. 

The overall error associated with the SMC estimate of the smoothing expectation under the auxiliary HMM 
may be decomposed as 

E[V„(X 0:n )|y :„] - E n [e,N,V n ,y :n} =E[V n (X 0:n )\y .. n ] - E e [V n (X .. n )\yo: n ] 

+E £ [V, i (A 0: „)|t/ 0: „] - E n [e,N,V n ,y 0:n }. (4) 

The first difference on the right of Q is a deterministic error, the second difference is a stochastic error. We will 
provide theoretical analysis of these two error terms which shows how the interplay between e, n and controls 
the overall quality of the approximation. These theoretical results, to an extent, are also studied from an empirical 
perspective. 

Regarding point 2., we show how the approximation scheme can be incorporated into a particle MCMC scheme 
in order to estimate static parameters of the HMM. Particle MCMC uses SMC techniques to generate proposals, for 
example, associated to hidden states of the HMM. Here, we use the particle marginal Metropolis-Hastings algorithm 
in [5] to sample from the ABC approximation of the HMM, with a prior placed upon the unknown static-parameters. 
In contexts where additive functions of the hidden state are also of interest (as above), we use the forward-only 
smoothing technique mentioned above, to use all the simulated samples from the SMC proposal (which is not 
typically adopted). A similar idea has been adopted by [26], except using an additional 'backward-pass' in the 
FFBS algorithm, which is not needed. 

In summary, our main contributions are to: 

• quantify, in terms of e and n, the error E[V n (Xo :n )|j/o:n] — E c [V ra (Xo : „)|yo:n] - henceforth referred to as the 
ABC error 
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• quantify, in terms of e, N and n, the error E £ [V„(Xo :n )|j/o :ra ] — H n [e, N, V n , yo :n } - henceforth referred to as the 
SMC error 

• provide empirical evidence which illustrates some of these theoretical findings 

This paper is structured as follows. In Section [2] we discuss the ABC approximation and we characterize the 
ABC error. In Section [3j SMC and MCMC simulation techniques for targeting the (ABC) smoother are detailed, 
and the estimation of static parameters in a Bayesian manner is also considered. In Section [4] our main theoretical 
result is given, which combines the ABC and SMC errors discussed above. In Section [5] some numerical studies are 
presented. In Section [6] the article is concluded. The proofs are in the appendices. 

1.1 Notations 

Given a measurable space (E,£), let \x be a a— finite measure, if be a non-negative kernel and / : E — > K a 
measurable function. The conventions := f E f(x)/j,(dx), K{f){x) :— J K (x , dy) f (y) , fJ>K(f) :— /j,(K(f)) are 
used. In addition, let Osc(/) := sup^ x y j eE2 \f(x) — f{y)\ and let Bb(E) be the Banach space of bounded and 
measurable functions on E endowed with the norm ||/|| := sup^g^ For two probability measures pi,fJ-2 the 

total variation distance is \\p,\ — ^Wtv = sup^ g(? |a*i(^1) ~ H2(A)\. For a Markov kernel K the Dobrushin coefficient 
is p{K) := sup (x y)eE 2 \\K(x, •) - K(y, -)\\ TV . Given a probability space (fi,^,P), we write || • || p = E[| • \ p } 1/p for 
the hp norm under P. For a set A £ £, (x) is the indicator function. 



2 ABC Approximation 

2.1 ABC Smoothing Approximation 

The joint smoothing density is 

~ , \ U7=o 9(xt, yi)Vo{xo) U7=i f(xj-i,Xj) 

0,n Ji(»+Di. iiio sfe' fi)*^) n"=i /(^-i - ^^om 

where we will suppress the dependence on the data on the l.h.s. In most scenarios of practical interest, one cannot 
calculate this density pointwise, or compute expectations w.r.t. the density. As a result, a numerical approximation 
of ([3]), via advanced computational tools, is required. This problem is further exacerbated when the density g(x,y) 
is intractable or very expensive to calculate. That is, one cannot evaluate it pointwise and there is no unbiased 
estimate available. However, we will assume throughout that one can sample from the associated distribution, for 
any x £ R dx . It is remarked that this latter condition is not completely necessary for any of the subsequent ideas 
that will appear (see [THUS]), but, it will facilitate a more compact exposition. 

To introduce ideas, let us momentarily step away from the setting of HMM's; suppose, one is given observations 
y G T> associated to some intractable likelihood ge(y) with ffsBan unknown parameter. Then, Bayesian inference 
associated to the posterior ir(6\y) oc gg(jj)Tt(9) is typically not feasible even using advanced computational tools; see 
[23]. To deal with this issue, ABC draws inference from the following modified posterior density on 0x2? 

to 1 \ K(6)ge{u)l A ^ y {u) 

with e > a tolerance level and u £ T> corresponds to some pseudo-observations. The set A e , y is defined as follows 

A t , y = {z£V:p{s(z) 7 s(y)) < e} 

where s : T> — » S represents some summary statistics and p : S x S — >]R + a distance metric. 

As noted by [HOH], given an appropriate structure for the likelihood (such as i.i.d. data) one can often achieve 
a more accurate approximation by removing the summary statistics and focusing upon the probabilistic structure 
of the likelihood. Returning now to our setting of the HMM specified in section 1, following [221 ES] we consider 
the ABC approximation of the joint smoothing density, for e > 0: 

* (a; ) = [Iir=o Jm^v H^r-)g(x*, u)du]r]o(x ) nlli ^ 

/ R (n+ WII™=o •/)»<% 4'( y ^ M:L )g(x i ,u)du}r]o(x )Y\ 7 l =1 f(x l -i,x l )dxo:n 
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where <t>{^-^) is a potential which can take the form I{ u: |( u _y)/d<i} or a probability density function. This particular 
ABC approximation maintains the Markovian structure of the model, which will help to facilitate computational 
algorithms. In particular, in order to approximate E e [V n (Xo :n )|?/o:n] where E e [-|yo:n] is an expectation w.r.t. the 
ABC smoothing distribution, one will still need to resort to numerical methods such as SMC and MCMC. 

From the perspective of a theoretical justification, results on the consistency in estimation (from both classical 
and Bayesian perspectives) of static parameters, associated to the ABC approximation, as n grows can be found in 
|12l 113) . There is an intrinsic asymptotic bias, but this bias can be removed by using a noisy version of ABC; see 
for further details. In this article we do not address the idea of noisy ABC. 



2.2 ABC Error 

In order to ascertain the potential of using an ABC approximation of HMMs, when considering smoothed additive 
functionals, we investigate the ABC error. The analysis that follows will concentrate on the scenario in which the 
ABC kernel is not the indicator function. In particular, this allows the inclusion of (^[TJ ([3]), below, which will 
facilitate the theoretical analysis of the SMC error of the ABC smoother, in Section [2] We remark that the below 
result for the ABC bias can also be established when ^) is an indicator function with some minor modifications 
to the proof. 

In the subsequent analysis we adopt the following assumption. 
(Al) 1. There exists a 1 < p < oo such that for each x, x' £ K dx and y € R dy 

P^ 1 < g(x,y) < p 
p- 1 < f(x,x') < p. 

2. There exists a L < oo such that for every y, y' G M. dy 

sup \g(x,y) - g(x,y')\ < L\y - y'\ 

with | • | the L\— norm. 

3. There exist functions a, a : M+ — » R+ such that for any y,u £ M. dy , e £ R+ 

a(e)<J V —) <a(e) 



with 5(e) := a(e)/a(e) monotonically decreasing. In addition J Rdy 4>( iL -^dy = 1 with J Rdy \y\<fi(y)dy < 
+oo. 

5 (n+l)d B ™. 



4. The analysis is associated to additive functionals V n 



p=0 



with v = sup < <oc ||up|| < +oo. 



The assumptions are rather strong and will typically only hold when the observations and hidden states lie on a 
compact state spaces, they are however, quite typically of assumptions employed in the analysis of SMC and related 
approximation methods. Of these assumptions, perhaps (Al|3) should be discussed. It can be verified when: 

y — u\ ( ( y — u\\ t is 

— - — I cx exp \~\ — - — ) r !/£(«>»)■ 

It is remarked that in practice, one selects </>( J ^^), so this is not such a demanding assumption. 

2.2.1 Result 

We have the following result, whose proof is in Appendix |B"} 

Theorem 2.1. Assume (A^. Then there exist a C < +oo such that for any e > 0, n > 1, yo-.n, 

|E[V„(A 0:n )|y 0: „] -E e [V n (X Q .. n )\y :n]\ < Ce(n + 1) 
where E[-|y 0: „] and E, e [-\y . n ] are the expectation w.r.t. the joint smoothing and ABC smoothing distribution. 



4 



Remark 2.1. The result establishes that the ABC error does not grow any faster than linearly in time or e. This 
is important, as it is known that the SMC ervor when estimating E € [Vn(-<^o:n)|yo:n 

] also grows at most linearly in 

time 11 5) /. As a result, the overall error as the time parameter increases will not necessarily be dominated by one 
source of error (SMC or ABC). This suggests that an ABC approximation can perform reasonably well in general. 



3 Simulation-Based Methods 
3.1 SMC Methods 

In the context of HMMs, SMC algorithms approximate {?)„} recursively by propagating a collection of properly 
weighted samples, called particles, using a combination of importance sampling and resampling steps. For the 
importance sampling part of the algorithm at each step n of the algorithm we will use general proposal (Markov) 
kernels H n which possess normalizing constants that do not depend on the simulated paths. A typical SMC 
algorithm is given below (we assume it terminates at time p+ 1): 



0. Initialisation: set n = 0; for i G {1, . . . , N} sample ~ t/q and compute 

Go (4) = g(xo,y ) 

with = G (4). 

1. Decide whether or not to resample, and if this is performed, set all weig hts {W^} i<i<N to 1. Proceed to 
step 2. 

2. Set n = n + 1, if n = p + 1 stop, else; for i G {1, . . . , N} sample X l n \x\ l _ l ~ H n {x l n _ 1 , •), compute 

and set = G n (xl l _ 1 . n )W^ l _ 1 and return to the start of step 1. 
3.2 Some details on resampling 

If one chooses to implement SMC without resampling steps, i.e. to perform sequential importance sampling, as 
time progresses, the variance of the weights {W£}i<i<N typically increases. This has been commonly referred to 
as the weight degeneracy property. To counter this resampling is used: the particles are sampled with replacement, 

according to the normalized weights {T^^}i<i<Af given by W„ = ^^ n wj and then each W„ is reset to 1. We 

remark that more efficient alternatives are possible; see e.g. [18]. 

If one resamples too often, the simulated past of the path of each particle will be very similar to each other. 
This has been documented as the path degeneracy problem. A common remedy was to resample only when an 
appropriate criterion drops beneath or goes above some threshold. In the former case, a common criterion is the 

effective sample size f^jLi (^n) 2 ) [23] • This approach, however, does not ultimately solve the path degeneracy 
problem. Path degeneracy has been a long standing bottleneck when static parameters 9 are estimated online using 
SMC methods by augmenting them with the latent state; see [IB]. Considering the central limit theorem (CLT) 
associated to the SMC estimate of E[V„(Xo :n )|yo:n] : 

N 

(=1 

it is remarked that the issue of path degeneracy leads, under very strong conditions on the HMM, to an asymptotic 
variance in this CLT that grows quadratically in n; see [27]. 

Suppose one resamples, multinomially, at every iteration, except when n = p. Denote the resampled index of 

the ancestor of particle i at time n by a l n G {1, . . . , N}; this is a random variable chosen with probability W^J 1 ■ 
Furthermore the joint density of the sampled particles and the resampled indices is 

/ N \ p / N - * * \ 

^4^,^-0 = n n^ 1 ^^ 1 ' 1 '") . ( 7 ) 
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where the complete genealogy of ancestors is denoted as a n = (a*, . . . ,a^) and the randomly simulated values of 
the state as x n = (x\ : . . . , x^). Together they form the following SMC approximations for fj n 



1 N 

7^ (dx :„) = 22 8y n (dx 0:n ) 



3=1 

and an approximation of the normalizing constant 

N 



P f 1 1 

Z P = II | J^/^n("4i-l:n) f- 
n=0 ^ 7=1 ' 



(8) 



The complete ancestral genealogy at each time can always traced back by defining an ancestry sequence 6g. ra for 

every i 6 {1, . . . , N} and n € {0, . . . ,p — 1}, whose elements are given by the backward recursion b % n — ol" +1 where 
bp — i. This interpretation of SMC approximations was introduced in [5] and will be used later together with 
^(xJ:^,ao: P -i) for describing PMCMC. 

3.3 Forward only Smoothing 

Due to the path degeneracy effect, one docs not want to use the SMC approximation fj^ (dxo- n ) to perform smooth- 
ing. One potential solution to this issue is the forward filtering backward smoothing algorithm and in particu- 
lar the forward only implementation of it in |16j . That is, the FFBS algorithm includes a backward simulation 
step, which is eliminated in [15] . We consider the SMC approximation of the expectation E[V n (Xo :n )|yo:n] where 

Vn( x 0:n) — J2p=0 v p( x p-l:p)- 

The construction of the procedure is as follows. It is first noted that 



:n)|Z/0:n] — / V n (x n )fj n (xQ. n )dX(). n 

with, for n > 1 



V n (x n ) := j V 

n {xO:n^)Vn(xQ :n —i \x n *)dXQ :n — \ 

Vq{xq) = 0, where ry„(xo : „_i |x„) = r) n (xo :n )/ J f) n (x 0:n )dx 0:n -ii then one can establish, e.g. [16], that 

V n (x„) = / [y n _i(x n _i) + v n (x n _ 1 . n )]f] n (x n _ 1 \x n )dx n _ 1 



where r)„(a: n _i|a; n ) = / fin(xo:n-i\x n )dx 0:n - 2 - 

These recursions lead to the following idea. Given the current particle approximation of the marginal of rf n —ii 
{W r A-D a; n-i}i<i<iV and of {K-i(^_i)}i<i<Af (write this {V^ 1 (arJ l _ 1 )}i<i<jv), one performs the following: up- 
date the SMC approximation as in Section [XT] and set 



V^(4) = ^l^n-l^-l'<)^l(^n-l) +V n (xi_ ll xl l )} . e{1) __^ } (Q) 

Sj=l ^n-l/C^ra-li ^n) 

with Vq^ = Vo- Then the SMC approximation of E[V n (Xo :ra )|j/o:n]j is exactly: 

N 

J2^nV n N «). (10) 

i=l 

It is apparent that the computational cost of this recursion is 0(N 2 ) per-time step. For functions such as V n (xo- n ) = 
S "=o v p(x p -i;p), it has been seen that, under some assumptions, the asymptotic variance in the CLT associated 
to ( |10p grows at most linearly in n. This is in contrast to growing quadratically at least quadratically in n, under 
similar assumptions, for the standard SMC estimate; see [TS] and also [T7] for additional theoretical analysis. 
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3.4 SMC for ABC 



If one cannot or does not want to compute g(x,y), then the algorithm described in Section 3.1 can seldom be 
implemented. In such contexts, we can easily use an SMC algorithm to approximate the fj n ^ in |6|. For example, 
in [55], at time 0, one samples the signal from i] and the pseudo observations from the likelihood to yield an 
incremental weight 

(0 

Cx (M ) 

where u l is the pseudo observation at time 0. At subsequent time-points one can sample from the signal transition 

and likelihood to obtain G n (u z n ) = (/>( "" ~ Vn )- The selection of e can be adaptive and different proposals (other 
than the state-dynamics) can be adopted; we refer to [22] for some discussion. 

It is remarked that a drawback of the algorithm is that when d y grows with e, N fixed, one cannot expect the 
algorithm to work well for every e; typically one must increase e to yield reasonable algorithmic results and this is 
at the cost of increasing the bias (see Theorem |2.1[ ). To maintain e at a reasonable level, one must consider more 
advanced strategies which are not investigated here. 

In scenarios where <p(^—^-) = ^{ U :\(u-y) / e|<i}(^r9 ' a potentially better procedure is to use the rejection kernel 
in [T3] (note this differs from the ideas of [H]). In this case, one initializes the SMC algorithm as above. However, 
at subsequent time-points, n > 1, one uses the kernel 

^n(( u n'-l) x n—l)i (\i X n)) ~ Gn— 1 ( U n- 1 )Hn (( u n ~ 1 J x n-l ) ) ( U n ) X n)) 

N G (u j ) 

+[1-G W _ 1 (<_ 1 )]V " U g w ((«ti.<-i). «,<))■ (11) 

3=1 1^1=1 ^n-l(\-i) 

In this case, at any given time-step, we will only resample those particles which have |u n _i — y n -i\ > £■ It has been 
shown by [T4"l pp. 304-305] that this kernel produces a lower asymptotic variance in the CLT than an algorithm 
which resamples at every time step. It will be of interest to see if this advantage is realized when N is finite, 
especially versus the dynamic resampling that is mentioned in Section [3. 1[ This particular SMC approach is termed 
'rejection SMC (RSMC) throughout the article. 



When using SMC for the ABC approximation of E[V n (Xo :n )|yo:n], the procedure in Section 3.3 can be followed 
with only modifications in notations and state-spaces. 



3.5 PMCMC 

In this section we consider the scenario where one has unknown static parameters 9 £ M. d " associated to the HMM. 
We concentrate upon batch inference. 

Particle Markov Chain Monte Carlo methods are MCMC algorithms operating on an extended state-space 
and targeting an extended distribution over the random variables appearing in the SMC algorithm. As in standard 
MCMC the idea is to run an ergodic Markov chain to obtain samples from the distribution of interest. The difference 
lies in the fact that, due to using an SMC approximation to generate a proposal, the invariant distribution of the 
simulated chain is defined on an extended state space, with an appropriate marginal being the distribution that we 
are interested in sampling from in the first place. 

We will present the particle marginal Metropolis-Hastings (PMMH) algorithm of [2J- The PMMH algorithm 
can sample from the target distribution 



Vn( x 0:n,6) °C 



J]_9e(xi,yi)vo{ x o)Y[fe( x i-i, x i) 



n(9) 



(12) 



where n(9) is the prior on 9. We concentrate upon the presentation in the scenario that one is interested in the 
original HMM; the ABC extension is simple and just uses the SMC procedures described in Section |3.4| instead 
of those at the start of Section |3.1| Note also, that the algorithm is given when using an SMC algorithm that 
resamples at each time-step; a dynamic resampling schedule can also be used. 
The PMMH algorithm proceeds as follows: 

• 0. Set 9(0). Sample x 0:p (0) v - N , a O:p _i(0) from @ (which now depends upon 9). Sample k € {1, . . . , N} from 
W$ and and compute Z p (0) as in Q. Store Z p (0), k(0), x 0:p (0) 1:N , ai :p _i(0), 9(0). Set i = 1 
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• 1. Propose a new 9' from a candidate q(9(i — 1), •) and Xq 1 ^ , a' 1; . . . , and fc' as in step 0. Accept or reject 
this as the new state of the chain with probability 



1 A 



Z' 



7r(e')q(6',e(i-l)) 



z(i-\)-K{e{i-i))q{9{i-i),e>) 



— ( Z' , k', x .„ 



'1:N =1 
0:p i d l:|)-l! 



otherwise 



If we accept, set (z(i), k(i), x^(i), ai :p _i(i), 9(i) 

(Z(i), k(%),x$(i), a 1:p _!(i), 9(f)) = (Z(i - I), k(i - l),x^(i - 1), ai,p_i(i - l),6{i - 1)) . 
and return to 1. 



Set i = i + 1 



In [2] it is shown that the sequence {x °'p (i),9(i)}i>i provides an approximation of (12), for any N > 1. 
If one is interested in approximating, say 

V n (Xo-.n, 9)fln(x0:n, 9)d(x a . n , 9) 

as noted by [25], the FFBS estimate can be used, based upon the SMC at each time-step, by simply extending the 
definition of V™ in ^ to include 9 (c.f. §K§): 



1 



M 

i=i j=i 



^2wi(i)v n N (xi(i),e(i)) 



where the first summation is over M iterations of the PMMH algorithm. Trivially, one can extend this to the 
case where only a forward pass as in Section |3.3| is used. A critical point is despite the improvement in the SMC 
estimation, whether this is necessarily reasonable given the increase in computational cost and the iterative nature 
of the MCMC; especially in an ABC context, which is presently not known to our knowledge. We remark again, 
that any of the SMC for ABC algorithms mentioned in Section |3.4| can be adopted, when considering the ABC 
approximation of ( |12[ ) ; whether using dynamic resampling or the kernel ( 11 ) the estimate of the normalizing constant 
is unbiased - see [2]for why this is of interest. 



4 Theoretical Analysis 
4.1 Set-Up 

We consider the error in estimation of smoothed additive functionals, when using an ABC approximation of the 
HMM. This is in the scenario where one does not need to estimate static parameters. Recall that we have already 
considered the ABC error in Theorem |2.1[ the main objective is to present a result with regards to the SMC error 
and the overall effect on the approximation of E,[V n (Xo :n )\yo :n }. We will use (AjlJ which only applies in the scenario 
where one uses a kernel density in the ABC approximation (i.e. not an indicator function). In addition, the SMC 
algorithm samples from the transition density of the state, with multinomial resampling at every time step (that is, 
RSMC is not considered). These hypotheses can be removed with a more technical proof. In addition, we condition 
upon the data and do not treat the randomness of these quantities. We simply assume that we are given a data set 
and do not address the issue of whether they may, or may not originate from a HMM. 



4.2 Result 



Below the h p — norm is associated to the random process generated by the SMC algorithm. We also use the abuse 
of notation G n<e (x) 



described in Section 3.4) 



x e Mr", to represent the incremental weights of the SMC algorithm (that is as 
Note that, in comparison to (10), we resample at every time-point, so we can use the 



incremental weights in the estimate, instead of the normalized weights. Note that 



A' 



E n [e,N,V n ,y :n] = 



G n ,e{x n ) 
= 1 = l G n ,e(Xn / 



(13) 



where 5„[e, N, V„, 2/om] is the quantity that we discussed in Section [TJ from herein we use the R.H.S. of (13) to 
denote the SMC estimate. 
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Theorem 4.1. Assume (-^V- There exist a C < +00 and for any p > 1 there exist a a p < 00 such that for any 
e> 0, N >1, n > 1 and yo-. n : 



N 

E 



1 y^j — 1 G n , e (Xn) 



V£ e (x l n )-E[V n (X .. n )\y 0:n ] 



< (n+1) 



a p 5(e) 



where 5(e) = max{<5(e) max{5(e) 2 , 5(e) 4 }, 6(e) 3 }, with 5(e) as in and E[-|j/o:n] * s ^ e expectation w.r.t. the 

joint smoothing distribution. 



Proof. After adding and subtracting E c [V„(Xo : „)|?/o:n] one can apply Minkowski followed by Theorem A.l and 
Theorem 12. II to conclude. □ 



Remark 4.1. The bound is decomposed into two sources of error. For the SMC approximation, the error tends to 
decrease as e grows as one would expect. Conversely, the ABC error term |E e [V„(xo :T i)|2/0:n] — E[V n (a;o:n)|yo:n]| grows 
as e grows. Both error rates increase at most linearly with the time parameter. When V n (xo : n) — J2p=o v p( x p)/( n + 
1), one can remove the linear decay in the bias term. In addition, with N fixed, the SMC error can be shown to 
decrease with n; see W>[ [Tfij /. 



5 Simulations 

5.1 Model 

Our numerical studies are implemented on the following HMM also considered in, for example, 2\. We take 
d x = d y — d and the model is: 

X n = + n 25 + 8cos(1.2n) + n>l, 

z L + JL n-l 

Y n = ff+Cr,« n>0, 

with (x,n l ~ (0, cr^-Irf) and independently Cy,u ' ~ Nd (0, cryl^) and Xq = 0^ with 0^ the d-dimensional zero 
vector, Id the d x d identity matrix and A/d(/i, E) the d-dimensional normal distribution of mean fi and covariance 
matrix S. Whilst the conditional density of the observations given the state is not intractable, it will facilitate an 
investigation into the accuracy of ABC. In this scenario one can obtain an approximation of the 'correct' answers 
using SMC/PMCMC with many particles/iterations. 

The objective of our numerical study, for smoothing is to consider the accuracy of ABC, when only considering 
forward only smoothing (the performance of forward only smoothing relative to using the path of particles has been 
studied elsewhere - for example [E]). We also want to investigate the worth of RSMC in the ABC context; recall the 
asymptotic improvements predicted in [14] . Along the way we also consider the issue of the dimension of the HMM 
and the utility of using ABC in high-dimensions. Finally, the time dependence of the errors are presented, to allow 
some investigation into Theorem |4.1| When considering PMCMC, we are concerned with both the accuracy of ABC 
for batch static parameter estimation and the worth of including forward only smoothing as a 'post-processing' of 
the MCMC output. 

5.2 Smoothing 

5.2.1 Implementation Details 

We consider estimating the expected mean state over the observation period [0,100]; i.e. v p (x p ) = x p /10l, p G 
{0, . . . , 100}. We set a\ = 10 and u\ — 1. The data are simulated from the true model with the given parameter 
values. To obtain a true answer with which to understand the accuracy of the methods we investigate, we use the 
mean estimate obtained over 50 implementations of the forward smoothing procedure, targeting the exact model, 
with 5000 particles. 

The algorithms for SMC (that is, approximating the exact model) and SMC ABC are run for 10 different values 
of N e {100, 200, . . . , 1000} which are labelled N x , . . . , N w in the Figures. The SMC ABC approach, i.e. that 
dynamically resamples, does so when the effective sample size drops below N/2. For the SMC, SMC ABC and 
RSMC (which targets the ABC approximation) the hidden state dynamics are used as proposals. We also run the 
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algorithms for d G {1, 2, 5, 10} which will also allow us to assess the accuracy of SMC for an ABC HMM in 'high' 
dimensions. To investigate the accuracy of ABC, we compute a true value for E[Vn(-^o:ra)|z/o:n], as discussed above, 
and then average the ILi — error of the estimate ^ n [6, iV, Vn; yo-.n 

], calculated with respect to the computed true 
value, across its dimension, i.e. e^' e — \ |E[V„(A 0:rl )|?/o:n] — S„[e, N, V n ,yo-.n]\, with | • | the Li-distance. An SMC 
procedure targeting the exact HMM is also run, to provide some benchmark performance; the corresponding error, 
e„ , is similarly calculated as the dimension-averaged L^— error of the SMC estimate S„[iV, V n , yo-.n] with respect to 
the same true value as above. All results are averaged over 50 independent runs. 

For the ABC specification, we set M^^-j = i{ U :\u~y/n<i\}(u)', this will allow us to easily understand the impact 
of the RSMC. In the implementations of the two SMC ABC schemes described in Section 3.4 e is set to be the 
smallest obtainable in a preliminary set of runs. That is, the smallest e for which the weights do not become zero 
at any time-point. 

To conclude the numerical study, we consider the time-dependence of the bias. We consider the SMC and ABC 
using only forward only smoothing as the time parameter increases from 10, 20,. . . , 100, d € {1, 2, 5, 10}. The SMC 
algorithm is run with N = 1000. RSMC is not considered. 



5.2.2 Results 

The exact and ABC forward smoothing errors, and e^' e respectively, are presented in Figure [lj the mean errors 
obtained across the 50 runs are displayed, along with their standard errors. Under the ABC HMM, as one would 
expect, in almost all of the plots the accuracy of the estimate e^ ,e cannot improve with increasing N (as the bias 
persists), but the variability of the estimates falls - i.e. the SMC component of e^ ,e is being controlled. In the 
plots for d £ {1, 2}, the exact implementation outperforms, as one would expect, the ABC approximation in terms 
of accuracy. This is illustrated by the means and standard errors of the smoothing errors being smaller than 
those of the ABC smoothing errors e^ ,c for a vast majority of the values of N. Interestingly, as the dimension 
increases (d € {5, 10}), the ABC estimates appear to be more accurate than their SMC counterparts (at least for 
this function). One might explain this as follows. For SMC in high-dimensions, one often requires N = 0(n d ) 
(k > 1) for some stability, but this is not the case for ABC - see [5] and the references therein. These (empirical) 
results suggest that ABC is a viable approximation technique in higher-dimensions, where it can be difficult to find 
SMC techniques that always work well. 

In Figures [2] and [3] we compare the performance of SMC and RSMC for performing estimation of the smoothing 
expectation under the ABC HMM. Figure [2] presents the mean and standard errors of the ABC smoothing errors 
that correspond to estimates calculated using the SMC method and the RSMC method; distinction is made through 
a further subscript, with the ABC smoothing errors corresponding to the RSMC estimates being denoted e„' e ' R . 
For clarity of presentation, we only display the results of 7 of the 10 values of N which were run. From Figure [2j 
it is noted that the accuracy of the SMC and RSMC procedures for performing ABC forward smoothing are very 
comparable, with the RSMC estimates even appearing to offer a marginal improvement over the SMC estimates 
in terms of mean smoothing error. The standard errors of e^' e and e^ e ' R are more clearly presented in Figure 
[3| This figure shows that, under the ABC HMM, the variability of the RSMC procedure seems to be slightly less 
than that of the SMC procedure, especially as N is allowed to grow. In addition, the observed run times for the 
ABC forward smoothing procedure were consistently lower when using the RSMC method against the SMC ABC 
approach. These results suggest, at least under the criteria considered, that the use of RSMC would not only be a 
viable alternative, but it could be preferable to using SMC with dynamic resampling. This is when using forward 
smoothing to perform inference with respect to the ABC approximation of the HMM. 

In Figure [4] we consider the time dependence of the error e^' e associated with the SMC method applied to the 
ABC HMM. We can observe that in this scenario, there is not any obvious increase in the overall error e^' e , with 
time, for this particular estimate associated to the smoothing distribution. This is consistent with our theoretical 
results which illustrate that the error does not grow any worse than linearly with time. As expected, on the basis 
of the results above, the quality of the SMC approximation appears to deteriorate (for TV fixed) as the dimension 
grows, but such a deterioration is less obvious for the ABC approximation. 



5.3 PMMH 

5.3.1 Implementation Details 

As for the smoothing, we estimate the expected mean state over the observation period [0, 100] as well as estimating 
the static parameters 9 = (ax, cry), with priors as in [2]. We set d = 1 throughout and the data are the same as 
for the smoothing experiment (when d = 1). To obtain a proxy for the true value we ran a PMMH algorithm as 
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Figure 1: Mean and standard deviations of the smoothing errors associated with the estimates of the mean state 
(Vp(xp) = Xp/101) obtained over 50 independent implementations of the forward smoothing procedure targeting 
the true HMM (e£, black) and its ABC approximation (e^ ,e , red). The horizontal axis represents the 10 different 
values of N for which we ran both algorithms. 



described in 2J for 50000 iterations with 20000 particles (no forward smoothing) and averaged the results over 50 



runs. When no forward smoothing is used, only the selected particle (see Section 3.5 ) is used for the estimate of a 
smoothed additive functional. 

The ABC approximation was as for the smoothing example (that is, the function -) = R{u:\u-y/e<i\}( u ))- 
To allow direct comparison to running an exact PMMH algorithm (that is, one which uses a dynamic resampling 
SMC algorithm on the true HMM) we only adopt an SMC ABC algorithm, i.e. we do not consider the use 
of RSMC here. For the SMC and SMC ABC the hidden state dynamics are used as proposals. The PMMH 
proposal on the parameters is as in 2J. We run the algorithms for 50000 iterations with a 10000 iteration burn 
in. In addition, 5 different values of N are considered N € {100, 200, . . . , 500} for the forward only smoothing 
approaches. In comparing to PMMH algorithms that do not use all the particles (and hence the computational 
cost of the SMC algorithm is O(N)) a number of particles with similar computational costs are run; these were 
{4427, 17139, 39020, 68258, 107007}. 

As with SMC smoothing, the accuracy of the PMMH procedures in estimating the smoothing expectation 
E[V n (Ao:n)|?/o:n] is measured using and e^ ,e . As above, these errors are calculated as the (dimension-averaged) 
Li— errors of the PMMH estimates under the exact and ABC HMM, respectively. All results are repeated over 50 
independent runs. 

5.3.2 Results 



Our results are displayed in Figures [5j7j In Figure |5| we can observe the accuracy of PMMH estimation of the 



smoothed additive functional, using SMC updates both with and without forward smoothing, under both the exact 
HMM and its ABC approximation. Here we observe the expected pattern; the use of forward only smoothing in 
the PMMH update scheme significantly enhances estimative accuracy for roughly the same computational cost - 
the accuracy is better and the variance lower. When using forward smoothing in the SMC update mechanism, we 
further observe that the ABC HMM can be targeted with reasonable accuracy. Consider the effect of increasing N 
on the errors in Figure [5] Interestingly, the improvement in estimation is more evident when using forward only 
smoothing, even though one expects a PMMH algorithm with more particles to mix better (see e.g. [1]) and thus 
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Figure 2: Mean and standard errors of the smoothing errors associated with the estimates of the mean state 
(vp(x p ) = x p /10l) obtained over 50 independent implementations of the forward smoothing SMC (e^ ,e , red) and 
RSMC (e^ ,£,ii , blue) procedures targeting the ABC approximation of the smoothing distribution. 



the estimation to be most likely improved. 

In terms of the estimation of parameters, we consider Figures [6] and [7j Here, we are mainly concerned with the 
quality of parameter estimation under the ABC HMM without forward smoothing - the forward smoothing cannot 
contribute anything to parameter estimation here. The accuracy of the ABC is, in general quite biased by up-to 
40% of the parameter values. The variance is also quite substantial relative to the exact approach. 

5.4 Conclusions 

On the basis of our numerical study we can tentatively conclude the following. For smoothing: 

• In higher dimensions, ABC, in terms of accuracy, is competitive with using standard SMC (even if the model 
is analytically tractable); 

• For ABC with ^(^^p) specified as an indicator function, one would prefer to use the RSMC procedure over 
an SMC procedure with dynamic resampling. 

For the use of PMMH for performing batch parameter estimation, it would appear that, for moderate length time 
series, using forward only smoothing is not necessarily useful. If one is interested in the estimation of smoothed 
additive functionals, however, the use of forward smoothing can provide significant improvements (for the same 
computational cost) in estimative accuracy when compared to the PMMH procedure in [2J. The ABC procedure 
produces parameter estimates which are perhaps more biased than estimation of smoothed additive functionals, but 
this is also linked to the fact that the estimation method used is focussed on the latter quantities. These conclusions, 
of course, cannot be comprehensive as they are model and quantity (w.r.t. estimation) dependent. However, we 
have seen similar trends in different examples or different parameter settings for the same model. 

6 Summary 

In this article we have investigated smoothing and static parameter estimation for HMMs with intractable like- 
lihoods. We have constructed SMC and PMCMC based-solutions for ABC approximations and investigated the 
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Figure 3: Standard errors of the smoothing errors associated with the estimates of the mean state (v p (x p ) — x p /101) 
obtained over 50 independent implementations of the forward smoothing SMC (e^ ,e , red) and RSMC (e^' e ' R , blue) 
procedures targeting the ABC approximation of the smoothing distribution. These standard errors are also displayed 
in Figure [2] 

bias associated to our procedure. There are several extensions to the work that has been considered here. From 
the perspective of parameter estimation, we have only considered batch estimation by using PMCMC. In many 
practical problems, one is often interested in performing statistical inference as samples arrive online. We are 
currently investigating methodology for this problem in [HO HE] and the theoretical and empirical work here is of 
great relevance in these latter ideas; in particular when applying the online EM algorithm as considered in [28] . 
In this article we have focussed upon the forward-only smoothing technique in |16j . however, this is not the only 
possibility; one can also investigate the ideas in [5] in the context of ABC. In particular, the relative performance 
of these procedures is of interest. 
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A SMC Error 

We give an analysis of the self-normalized estimate that was not the objective in [TH] and is explicit in e. To that 
end, we introduce the following notations, to keep a consistent notation with [151 US] on which our analysis relies. 
We set: 

H((x,u),(x,u')) = g(x',u')f(x,x') 



G nte (u) 




To avoid notational overload, we will simply write H (x, x'), x E M. d:c x M. d y —: E and G n ^(x), x G K da; x M. d « (despite 
the independence of G n ^ e on only R dx ). 
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Figure 4: Time dependence (horizontal axis) of the smoothing errors associated with the estimates of the mean state 
(Vp(xp) — x p /(n +1), n — 10, . . . , 100) obtained over 50 independent implementations of the forward smoothing 
SMC (e„ , black) and ABC (e^ ,e , red) procedures targeting the true and ABC approximation of the smoothing 
distribution respectively. 



We consider the approximation of the path measure Q n ,e(V n ), which is by definition: 

r n-1 



8n, 6 (V„) := 



m / H^WC 3 ^) V n (x , n )rj (x )Y[H(x p ^ 1: x p )dx .. p 

JnA 1 -) JE^ + i L p=0 J p=1 



where 



/- n— l-i n 
Y[ GpA x p) f(%n)m(Xo) Y[ H(x p -i,Xp)dx :p 
L p= o J p= i 



and / : E — > R. Recall for additive functionals V n {xo:n) = Ylp=o v p( x p) 



N 



ttN i \ i \ , G n -i e (x n _ 1 )H n (x n _ 11 x) N i 

V„ e {X) = V n {X) + > ^ ; ; l / „_ 1 ,(x„_i) 

i=l Zjj'=1 Lz ™-l,el x n-l/'- rI nV x n-lJ x ^ 

with Vq^ = vq] c.f. ([£]). We remind the reader that v p is a function on R dx only. 

Theorem A.l. Assume (^^- Then for any 1 < p < +00, there exist a a p < +00 such that for any e > 0, N > 1, 
n>l, y 0:n : 



N 



<4 "" U ''" ] V^«)-r[V„(A 0: „)|y :, 



< 



a p (n + l)6(e) 



where (5(e) = max{5(e) max{<5(e) 2 , 5(e) 4 }, <5(e) 3 }, luit/i 5(e) as in (^^Tj) and E e [-|y 0: n] *s fie expectation w.r.t. the 
joint ABC smoothing distribution. 



Proof. Consider the decomposition: 



E G n,e( X n) yN (i \ 
t-^N ^ / i ■> v n 1 t\ Jj nl 



i=l Ei=l Gn,e(ri) 



(G n ,eV n ) 
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Figure 5: Standard errors of the smoothing errors associated with the estimates of the mean state obtained over 
50 independent implementations. The PMCMC with exact SMC is in black, the ABC in red. The dotted lines 
indicate the usage of forward only smoothing. The dotted horizontal line is the estimated true value. 



AT X/i=l G n ,t{x n )V nt {x n ) 



'n,e(G n ,e) ~ "TT ^ G„ )e (a^) 



iV 



Vn 



N 



3 = 1 



1 / 1 - 

- G n A<)v n N e «) - QnAGnM 



i=l 



where we recall the normalized n— time marginal 7] n>e (f) = 7n,e(/)/7n,e(l)- Note that 

i(Gn te V n ) 

-p=0 



(14) 



? 7n,e(G ! n,e) 



i f r n i n 

y ^ p— J 1 



ln,e(G n , e ) J £n 

E e [V n {X Q .. n )\y :r 



which is the quantity of interest. 



To consider an L p — analysis, we can split the two terms in (14) via Minkowski. We consider the first term: 



at G n ^(x n )V n e (a 



Vn,e(G n . e ) ^2j= 1 G n ^{Xri) 



f]n,t(G n 



1 r A 

i=i 7 



Now, one has: V^^x^) < Y^p=o \\ v p\\- This is proved by induction, the initialization with n = being obvious, 
assuming for n — 1, one has: 



Vn >e {x) = v n (x) + 2^ a, — ■ — -^„_i, £ «_i) < u„(a;) + IKII 



=i G n -i. e (x J n _ 1 )H n (x : ' n _ 1 , x) 



p=0 



and one easily concludes. Thus, it follows: 

TV Si=l ^-">i,e( a; n)^n,e( a 'n) 
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Figure 6: Standard errors of the smoothing errors associated with the estimates of a\ obtained over 50 independent 
implementations. The PMCMC with exact SMC is in black, the ABC in red. The dotted lines indicate the usage 
of forward only smoothing. The dotted horizontal line is the estimated true value. 



< (n + l)wa(e)" 1 



1 N 

(G n ,e) — ^ G n , £ (a; J n ) 



By Theorem 7.4.4 of [14] that there exist some a p < oo such that 



A' 



N 



< 



a p S(e) 2 a(e) 



N 



hence 



N 



1 " ■ \ 

sC^n.e) — ^ ^ G nte (x J n ) J 
.7=1 ' 



< 



a p {n + l)(5(e) 3 



JV 



for some a p < oo that does not depend upon n or e. To deal with the second term in (14) one can use Lemma 
along with (A[lJ Q to conclude. 

A.l Technical Result 

We provide a proof of Lemma [A. 1| To that end, introduce the operator: 



A.l 



□ 



D p,n,A V n){Xp) = I M p> :(x p ,do:p-l)Qp,n.A X P^ dx p--n)V n (x n ) 
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Figure 7: Standard errors of the smoothing errors associated with the estimates of o~y obtained over 50 independent 
implementations. The PMCMC with exact SMC is in black, the ABC in red. The dotted lines indicate the usage 
of forward only smoothing. The dotted horizontal line is the estimated true value. 



where 



Mp e (x p , dxo.p-i) = Y[M qtV N_ ite (x q ,dx q _i) 



AI, 



V^-l(dXq-l) 



9=1 



Yi Qq,e(Xq-l,dx q ) 
q=p+l 

^ 1 (da; g _ 1 )G? g _i )e (xg_i)i?' 9 (a;g_i,afg) 



Vq-l(Gq-l,eH q (;Xq)) 
lq,e{Xq-l: d,X q ) — Gq-l, e \X q - 1 )H q (x q -\ , X q )dx q 



1 N 



where all the conventions of [15] are preserved. In the empirical measure in the final line, one considers the mutated 
particles. We also use the convention Q Pi n.e — Qp+i,e ■ ■ ■ Qn,e- Note for the backward kernel, when considering the 
filter fj q -i,e one can write 

. , . _ f lq ^ e (dx q ^ 1 )Hq(x q ^ 1 ,Xq) 

Wl - el 9 ' " f, q -lAHq(;Xq)) ■ 

Lemma A.l. Assume (J^ty. Then for any 1 < p < +00 there exist a a p < +00 such that for any e > 0, N > 1, 

n > 1, J/0:n- 

1 x-^ „ _ , „ ... . a p (n+ i)5(e)a(e) 



N 



Gn, £ «)C«) " QnAGnM 



< 



N 



where 5(e) = max{(5(e) 2 , 5(e) 4 }, with 5(e), a(e) as in 
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Proof. The proof of this result follows the proof of Theorem 3.2 of [15 . The complications are the control of 
the oscillations of P^ n e (G n _ e V n ) = D pne {G nt€ V n )/Dp n ^i) with a different function as well as obtaining a rate 
w.r.t. e. It is remarked that the application of the Kintchine-type inequality in Theorem 3.2 of |15) will not add 
any dependence upon e. 

Using the definition of D^ n e one has 

p n 

Di?,n,e(Gn,eVn) = Qp.nAGnA 2_j M p,r,™_ l ,t ■ ■ ■ M q+l^ A V q) + ^ Qp.qA V qQq,nAGn,e))- 

q=0 q=p+l 



Thus it follows that: 



(Gn,e) 

Qp,n,e (1) 



q=0 



q=p 



^p, q ,e(Q q ,n,e(G n ,e) v q) 
Sp,q, e (Q 'g,n,eO-)) 



(15) 



where S p , q>e (v) = Q p ,qA v )/Qp,qAl) and Q q jnje («) = Q<j,n,e(v)/%e(Q 9 ,n,e(l))< To deal with oscillations of P^ n e (G nte V n ) 
we consider both sums separately. 

We being with the first term on the R.H.S. of ( 1 5 1 . In particular the difference with arguments x and y: 

p- 1 r 



}p.nA l ){ X ) 



Yl M P,<_i,e ■ ■ ■ M g+l,<,eK)(X> 



<-{G n> e){y) 



p-1 



q=Q 



?p,7i,e(l)(y) 



Then, treating the summands, one has 

Qp,n,e {G n e ) {x) 
Qp, n ,e{l)(x) 



m p,v^ ■ ■ ■ M q+i,v^A v i)( x ) - M q+i,v?A v q)(y) 

Sp,n,e{G n ,e){%) <Sp,n,e {G n ,e ) {u) 



M g+l,<,e (%)(*/) 

which is clearly upper-bounded by 

2||G„, e ||||« g ||[/3(M p , <i , £ ...M ?+li< J+/3(^, n , e )] 



(16) 



Now consider the second term on the R.H.S. of ( 15 ). In particular, for each summand, one has after subtracting 
the function in y from that in x 

S P,qAQq,nA G n,e)v q )(x) ~ S p ^ £ (Q q ^ e (G n ^)Vg){y) 



+S p , q AQ q ,nJG n Av q )(y) 



Sp.qAQq^A 1 )^) Z S P>lAQq,nA l ){x) 
Sp,qAQq,n,e( r )(y) S P><l>e(Qq,nA 1 )( X ) 



Dealing with the two terms separately, one can easily show that the first term is upper-bounded by 26^ n £ ||u 9 || ||G niC ||/3(5 p . 
where b q ^ n t — sup a y Q Pt nA^){x)/Q P ,nA^)(y)- Similarly using trivial manipulations, one can also show that the 
second term is upper-bounded by same term, yielding the upper-bound 



ibl n A\v q \\\\G n AW(s p , q A 



(17) 



Combining the bounds ( 16 )-( 17 ) one can deduce that: 

p-i 

OsAP^AGnM) < 2||G n ,,|| £ KP^p.^e . -.M q+1< 

9=0 
p-1 

+A\\G n A\Y,\\ v M,nJ{S Ptq>e ). 

<Z=0 

Thus, one has, via the proof of Theorem 3.2 of [T5] : 



P(S Pt , 



N n 

- GnAOvZiO - QnAGnM < a p ]T 



(=1 



P p=0 
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where 

pi p—l 



c N =E 



nG n ,e\\ E Kll [/ 3 ( M P,<_ 1 ,, ■ ■ ■ M <T+l<,e) + + 4||G n , £ || E II^C.^OW) 

9=0 g=0 



To complete the proof we need to consider the term Cp n e , to that end, we quote the following bounds which 
follow from (A[l]); see [TS] and the citations therein for details: 

b p ,„, e < S(e)p 4 ftS p , q , e ) < (1 - 6(e)' V 8 ) 9 ^ P(M p ^ e . . . M g+1 , <>e ) < (1 - p- 8 )^. 

First consider the expression 

n p—l 
p=0 9=0 

which is upper-bounded by 

n p—l 

2a(e)(i(e)/) 2 ^^[(l - p- 8 )^ + (1 - 5(e)- V 8 )"-"] 

p=0 9=0 

with v as in Q. By standard manipulations, this is upper-bounded by 

Ca(e)S(e) 2 (n + l) 
for C < oo that does not depend upon n or e. Second the expression 

n p—l 
p=0 9=0 

which is upper-bounded by 

n n 
p=0 9=p 

Again, by standard manipulations one can upper-bound this latter expression by 

Ca(e)<5(e) 4 (n + 1) 

for C < oo that does not depend upon n or e; we can now conclude. □ 

B ABC Error 

Below we will repeatedly apply Theorem 2 of [55] • This can also be established under (A[I]) for smoothed ABC; 
the proof is omitted and follows the description in [22] . Recall that the ABC approximation of the joint smoothing 
density is: 

- , s = il\7=0 f^y H Bz r L )g(x l ,u)du)Tlo(x ) nr=l f{xj-i,Xj) 

JaCn+i)^ [Il™=o (j){^^)g{x i ,u)du]r lQ (x )]X^ 1 f{x l - 1 ,x i )dx , n 

It is stressed that the analysis here is performed by integrating the auxiliary data, whilst the SMC analysis works 
on the joint space of auxiliary data and hidden state. 

Proof of Theorem \2.1\ We have 



} -E e [V„(X 0:n )|yo:n]| = IE / v p(x P )Iv 

p=0 ^ 

where fj Pie (x p:n ) and fj p (x p:n ) are the ABC and true smoothers. Using the backward representation of the smoothers, 
one has the decomposition of the R.H.S.: 

n r, n 

lE / V p( X p)[ f ln{Xp:n) ~ fj„, e (x p:n )]dx p:n \ = | E WlH-l.i-l, ( U p) _ Vn M n:p+l,*i n -i.. p ,.,e( V p) I (18) 
p=0 ^ p=0 
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where 



/n 
Vn fan ) Yl M qt fj q _ 1 (x q ,dx q -x)v p (x p )dx n 

q=p+l 
~ n 

fjn,e^n:p+l,fi n - 1:p , € ,e( V p) = Vn,e( X n) M q ^ q _ l c , e {x q , dx q ^i)v p {x p )dx n 



q=p+l 



with the backward kernels: 



, , s r)q-l(Xq-l)f(x g -l,X q ) 
M q,fl q -l{Xq, d Xq-l) = S TJj V, OX q -\ 



M q s lq _ le , e (x q ,dx q -i) 



Vq-l(f(-,Xq)) 
fj q -i te (x q -i)f(x q -i,Xq) 



Vq-l,e (f(;Xg)) 



dXg-1. 



(19) 
(20) 



We will drop the rj subscripts for the remainder of the proof. 



One can now adopt a telescoping sum decomposition for each summand of the R.H.S. of (18): 

n-p , . 

J~] ( r\ n Mn\n-8,JMn-s+\ - ^n-s + l.A{ M n-s:p+l{v p )) I + [fjn ~ Vn,e](M n:p+lte (v p )) 



For the second term, one can use Theorem 2 of |22j . Thus concentrating on the summands in the first term we have 

i)tiM„ ; „-, ]£ [M„- !+ i - Af n _ 8+l , £ ](M n _ s;! , +1 (« p )) < ||^„M n:n _ Sie [M n _ s+ i - M ra _ s+ i )e ]||rv x 

P+i 

( II P(M q ))0 S c(v p ) 

q—n—s 

via (A[T]) ([!]), Q and Lemma B.l it clearly follows that there exist a C < oo and £ £ (0,1) which do not depend 
upon n, yo:n 6 such that 



?7„M„ : „_ s , e [M„_ s+1 - M n _ s+ i !e ](M„_ s:p+1 (u p )) < Cee +s+2 - n - 



As a result, we have 



n r n— p 



|E[V n (X :„)|j/0:„] -E e [V n (X 0: „)|yO:„]| < ^ ]T ( ]T ^+ s + 2 ~«) + 1 

p=o L s=i 

for C < oo that does not depend upon n, yo-.n e< Elementary manipulations allow us to conclude. 



□ 



B.l Technical Result 

Lemma B.l. Assume (^Qp- TTien there exist a C < +00 such that for any k 6 {0, . . . , n — 2} e > 0, yo :n and 
ip £ Bb(R d *) we have 

sup I / [M k+1 ^ k e ^(x,dz) - M k+ i }fjh (x,dz)](p(z)\ < Ce 



where Mk+i,fj k 5 ,e an d Mk+i,fj k are defined in ( 19 ) -( 20 ) and and r)k are the ABC and true filters. 
Proof. We have the decomposition: 



/[M, 



/■ +hf) k , ct c{x,dz) - M k +x Ah {x,dz)]v{z) = J tp(z)f(z,x) 

+fjk / s r J[Vk(u) - f) k , e (u)]f(u,x)du 

\f f)k,e(u)f(x,u)du J f] k (u)f(u,x)du 

where we have suppressed the data from the notation. 



dz 
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Dealing with the first part, we have for some C that does not depend upon x, k, e or yo :n 

f]k.e{ z ) - Vk(z) 



ip(z)f(z,x) 



J flk,e{u)f{u,x)du 



dz < Ce\\ip\\ 



where we have used (^[lj ([T]) in the denominator and to control / as well as Theorem 2 of [22] . Now, for the second 
part 

f[fjk(u) - fjk,e(u)]f(u, x)du 



tp{z)f{z,x)f] k (z) 



dz < \\ip\\Ce 



J fj k ^(u)f{x 1 u)du J f) k (u)f(u,x)du _ 

where, again (AH} ([I]) has been applied along with Theorem 2 of [22] and C does not depend upon x, k, e or y 0:n . 
Using the uniformity in x of the above bounds allows us to conclude. □ 
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